Skip to content

Add test for vk_buffer from host memory#19254

Open
sredman wants to merge 5 commits intoggml-org:masterfrom
sredman:work/sredman/vulkan-external-memory-host
Open

Add test for vk_buffer from host memory#19254
sredman wants to merge 5 commits intoggml-org:masterfrom
sredman:work/sredman/vulkan-external-memory-host

Conversation

@sredman
Copy link
Copy Markdown
Contributor

@sredman sredman commented Feb 1, 2026

Add a test to validate external host memory can correctly be allocated, written, used, and read.

I already had this test for my own external memory buffer solution, but then I noticed #18467 had done the work. I thought the test might still be useful.

@sredman sredman requested a review from 0cc4m as a code owner February 1, 2026 20:50
@github-actions github-actions bot added Vulkan Issues specific to the Vulkan backend ggml changes relating to the ggml tensor library for machine learning labels Feb 1, 2026
@sredman sredman changed the title Work/sredman/vulkan external memory host Add test for vk_buffer from host memory Feb 1, 2026
@0cc4m
Copy link
Copy Markdown
Contributor

0cc4m commented Feb 5, 2026

I don't mind adding further backend-internal tests, but I think we should move them out of the main file. I plan to look into splitting it up in the future. Maybe we can start here by adding a ggml-vulkan-tests.cpp and .hpp? You don't have to move my tests as well, but you can if you want.

@sredman
Copy link
Copy Markdown
Contributor Author

sredman commented Feb 7, 2026

I don't mind adding further backend-internal tests, but I think we should move them out of the main file. I plan to look into splitting it up in the future. Maybe we can start here by adding a ggml-vulkan-tests.cpp and .hpp? You don't have to move my tests as well, but you can if you want.

Do you have the shape of a plan for this?

When I was originally working on this test, I tried to make test file in the ggml/tests path. That would have required an extensive refactoring in order to be able to reference the internal functions. We would need to mark any functions we want to call as non-static and export them. However, the more messy problem was that we'd need to extract all the struct definitions which we want to use, which is quite an undertaking. It could certainly be done, but would definitely need to be prepared and checked in quickly to avoid mass merge conflicts 🙂

@0cc4m
Copy link
Copy Markdown
Contributor

0cc4m commented Feb 9, 2026

That makes sense, I hadn't looked into it enough yet. Then I'll postpone that until I get to it myself. However, please move your test into the existing #ifdef GGML_VULKAN_RUN_TESTS area where the other tests are, including the call. The file is already way too large, I would like to avoid test code spread around everywhere.

@sredman
Copy link
Copy Markdown
Contributor Author

sredman commented Feb 16, 2026

@0cc4m , I have moved the test declaration into the preexisting block of tests, and the call into the preexisting block of calls (in ggml_vk_preallocate_buffers).

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that this abort needs to be commented out in order to run the new tests.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feel free to remove it, I must have accidentally merged it. You can also move your test to the top, provided it fails non-fatally. The matmul tests take a while to run, and I haven't really needed them anymore in a long time.

Copy link
Copy Markdown
Contributor

@0cc4m 0cc4m left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry about the delay. I ran the test and it works. It showed me that buffer_from_host_ptr does not work on my Intel iGPU, at least not in this way. I went through the code and added some comments. Once those are resolved it's good to merge.

Comment on lines +12101 to +12106
static std::vector<vk_device> tested_devices;
if (std::find(tested_devices.begin(), tested_devices.end(), device) == tested_devices.end()) {
tested_devices.push_back(device);
} else{
return; // Already tested this device, skip
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The existing tests just run once and abort, it's not intended to keep them active during runtime, so this test isn't necessary.

return; // Already tested this device, skip
}

VK_LOG_DEBUG("ggml_vk_test_single_device_external_memory(" << device->name << ") - Single-device external host memory test");
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The debug name is inconsistent with the function name, and the additional text is unusual. Was that to make it easier to spot in the debug log?

Comment on lines +12124 to +12127
std::unique_ptr<float[], decltype(input_deleter)> input_host_memory(
static_cast<float*>(aligned_alloc(min_alignment, aligned_size)),
input_deleter
);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a little convoluted compared to just going float * input_host_memory = (float *)aligned_alloc(...) and adding the free() call to the end, but if you prefer this, I don't mind.

However, to make it easier to understand what is going on, please check if the alloc succeeded and print an error message if not. Same for the other alloc below.

vk_buffer device_dst_buffer = ggml_vk_create_buffer_device(device, test_buffer_size);

// Allocate an external memory buffer on device, importing the host memory
vk_buffer external_source_buffer = ggml_vk_buffer_from_host_ptr(device, input_host_memory.get(), test_buffer_size);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same thing as above, to make this easier to understand please check if this actually succeeded and print an error if not. In my test it returned a nullptr and this just crashed the copy_async functions below, so I needed a debugger to figure out which function failed.

free(d_chk);
}

static void ggml_vk_test_single_device_buffer_from_host_ptr(vk_device& device) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
static void ggml_vk_test_single_device_buffer_from_host_ptr(vk_device& device) {
static void ggml_vk_test_device_buffer_from_host_ptr(vk_device& device) {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feel free to remove it, I must have accidentally merged it. You can also move your test to the top, provided it fails non-fatally. The matmul tests take a while to run, and I haven't really needed them anymore in a long time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ggml changes relating to the ggml tensor library for machine learning Vulkan Issues specific to the Vulkan backend

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants